Is This a Paraphrase? What Kind? Paraphrase Boundaries and Typology
نویسندگان
چکیده
A precise and commonly accepted definition of paraphrasing does not exist. This is one of the reasons that have prevented computational linguistics from a real success when dealing with this phenomenon in its systems and applications. With the aim of helping to overcome this difficulty, in this article, new insights on paraphrase characterization are provided. We first overview what has been said on paraphrasing from linguistics and the new lights shed on the phenomenon from computational linguistics. Under the light of the shortcomings observed, the paraphrase phenomenon is studied from two different perspectives. On the one hand, insights on paraphrase boundaries are set out analyzing paraphrase borderline cases and the interaction of paraphrasing with related linguistic phenomena. On the other hand, a new paraphrase typology is presented. It goes beyond a simple list of types and is embedded in a linguistically-based hierarchical structure. This typology has been empirically validated through corpus annotation and its application in the plagiarism-detection domain.
منابع مشابه
Generalizing Sub-sentential Paraphrase Acquisition across Original Signal Type of Text Pairs
This paper describes a study on the impact of the original signal (text, speech, visual scene, event) of a text pair on the task of both manual and automatic sub-sentential paraphrase acquisition. A corpus of 2,500 annotated sentences in English and French is described, and performance on this corpus is reported for an efficient system combination exploiting a large set of features for paraphra...
متن کاملPlagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection
Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyze the relationship between paraphrasing and plagiarism, paying special attentio...
متن کاملParaphrase Concept and Typology. A Linguistically Based and Computationally Oriented Approach
In this paper, we present a critical analysis of the state of the art in the definition and typologies of paraphrasing. This analysis shows that there exists no characterization of paraphrasing that is comprehensive, linguistically based and computationally tractable at the same time. The following sets out to define and delimit the concept on the basis of the propositional content. We present ...
متن کاملUPPC - Urdu Paraphrase Plagiarism Corpus
Paraphrase plagiarism is a significant and widespread problem and research shows that it is hard to detect. Several methods and automatic systems have been proposed to deal with it. However, evaluation and comparison of such solutions is not possible because of the unavailability of benchmark corpora with manual examples of paraphrase plagiarism. To deal with this issue, we present the novel de...
متن کاملMethods for Detecting Paraphrase Plagiarism
Paraphrase plagiarism is one of the difficult challenges facing plagiarism detection systems. Paraphrasing occur when texts are lexically or syntactically altered to look different, but retain their original meaning. Most plagiarism detection systems (many of which are commercial based) are designed to detect word co-occurrences and light modifications, but are unable to detect severe semantic ...
متن کامل